A detailed land use/land cover map for the European Alps macro region

Spatially and thematically detailed land use maps are of special importance to study and manage populated mountain regions. Due to the complex terrain, high elevational gradients as well as differences in land demand, these regions are characterized by a high density of different land uses that form heterogeneous landscapes. Here, we present a new highly detailed land use/landcover map for the areas included in the European Strategy for the Alpine Region. The map has a spatial resolution of up to 5 m and a temporal extent from 2015 to 2020. It was created by aggregating 15 high-resolution layers resulting in 65 land use/cover classes. The overall map accuracy was assessed at 88.8%. The large number of land use classes and the high spatial resolution allow an easy customization of the map for research and management purposes, making it useable by a broad audience for various applications. Our map shows that by combining theme specific “high-resolution” land use products to build a comprehensive land use/land cover map, a high thematic and spatial detail can be achieved.


Background & Summary
Land use/land cover (LULC) maps present information on the physical land types that characterize the surface of the earth (i.e., land cover) and describe how humans use this land (i.e., land use) 1 . These maps allow to monitor land cover changes and land allocation for agriculture, urban development, nature conservation et cetera, and to assess the provision of ecosystem services and habitats 2,3 . The use of high resolution LULC maps is particularly important in those areas that are characterized by complex landscapes and unique geo-topographic conditions, such as mountain ranges. These areas face multiple challenges, such as biodiversity loss, a high vulnerability to climate change, and negative demographic trends, and are therefore in need of accurate and updated LULC information for their effective management [4][5][6] .
The European Alps represent a unique environment characterized by a great variety of ecosystems and landscapes that are increasingly threatened by different pressures 7 . Land use intensification in the valley bottoms is affecting the presence of green infrastructure elements such as hedgerows and riparian areas, leading to the isolation of natural habitats and a decrease in ecological connectivity 8 . The increase in temperatures caused by climate change is progressively opening to agriculture new areas at higher elevations, causing the upward shift of economically valuable crops 9 as well as a natural shift in habitats 10 . Rural abandonment is causing the progressive marginalization of large areas, while urban areas are experiencing intensive urbanization with a significantly growing number of inhabitants 11 . To tackle these challenges, it is important to develop specific tools and data that inform policymaking, research, land planning and resource management 2 .
The availability of LULC maps of the European Alps that have both, a high thematic and spatial detail (i.e., maps characterized by a high spatial resolution and many LULC classes) is, however, limited. Indeed, even if the increased accessibility of "high-resolution" satellite imagery, of powerful computing capabilities, and of new computing techniques (e.g., deep learning) has brought new opportunities for the automated mapping of land cover 3 , LULC maps of the Alps still usually only fulfill one of the two desired characteristics. An example of a thematically very detailed LULC map is the Corine Land Cover map (CLC 12 that includes 44 LULC classes 13 . However, from the spatial point of view, CLC has only a medium resolution (100 m, with a minimum mapping unit (MMU) of 25 ha), which limits its usability in mountain areas. Conversely, the map recently developed by Malinowski et al. 2020 has a high spatial resolution (10 m) but only 13 LULC classes 14 . The same holds true for other recent LULC maps that include the European Alps [15][16][17] . To improve both the spatial and the thematic detail of existent LULC maps, various methodologies have been developed by researchers: Rosina et al. 18 , for example, www.nature.com/scientificdata www.nature.com/scientificdata/ used a CLC refinement approach by integrating multiple datasets with higher spatial resolution and decreased the MMU from 25 to 1 ha, Pigaiani & Batista e Silva 2021 19 applied a similar methodology increasing the spatial resolution to 50 m. Using similar procedures many other LULC maps have been produced, mostly focusing at the national and subnational level [20][21][22][23] . However, there has been no attempt to create a specific LULC map focused on the entire Alps with both a high spatial and thematic resolution.
Here, we present the first spatially and thematically highly detailed LULC map for the European Alps. We collected, harmonized and combined freely available datasets from 11 different sources to build a high-resolution map that includes 65 different LULC classes. By including small LULC features, this map is intended to support a wide range of analyses spanning from research to land management and decision making. For example, the spatial impact of linear elements such as roads, rivers and hedges can be analyzed and included in ecological connectivity mapping models or ecosystem service assessments. Local administrations can also benefit from the high resolution of the map, which can support landscape planning and resource-efficient management.

Methods
As a reference to define the extent of the European Alps we used the area included in the European Strategy for the Alpine Region (EUSALP). This area covers a total surface of more than 440,000 km², including 7 nations and 48 administrative regions (Fig. 1).
The creation of the EUSALP map included the following main steps: firstly, we selected freely available datasets that covered our area of interest. Secondly, we adapted the retrieved datasets with minor alterations in order to combine high-resolution datasets from different sources. Thirdly, we harmonized all the layers using the same spatial reference system and resolution. As a last step we mosaicked the layers using a specific hierarchy based on codes given to each LULC class (Fig. 2). Finally, we validated the resulting map using an area-weighted confusion matrix approach.

Data selection.
In the first step, we collected all openly available LULC datasets that cover the whole EUSALP macro region. The following collection criteria were applied: a reference year between 2015 and 2020, a thematic accuracy higher than 80%, and a high spatial resolution (10 m). The selected data are presented in Table 1 (the area covered by the single datasets is shown in Figure S1).

Data adaptation.
For certain data layers (i.e., OSM Roads & Railways, EU Hydro, HRL Grassland) some adaptations were necessary prior to harmonization. Linear features (i.e., roads, railways) from the OSM were converted into polygon features by assigning the width defined by the OSM specifications (6 m width for secondary and tertiary roads as well as tracks and field roads, 10 m width for primary roads and railways, 20 m width for motorways and trunks), all tunnels were excluded. The EU Hydro River polylines were converted into polygon features using a width according to the Strahler Stream Order 24 . To characterize the use intensity of grasslands, that in the HRL Grassland dataset 25 are defined using only a binary grassland/non-grassland classification, we divided them into three LULC classes based on elevation and slope. The classification was based on the following criteria: managed grassland (<2000m elevation and <26° slope), seminatural grassland (<2000 m elevation and >26° slope), Alpine natural grassland (>2000 m elevation) [26][27][28] . For the calculation we used the European Digital Elevation Model (EU-DEM), version 1.1 29 .
Harmonization. We harmonized all the layers using the same reference system and resolution to ensure the geographical consistency of the final dataset. We projected the selected raster datasets into the same spatial reference system (EPSG:3035 ETRS89/ETRS-LAEA) and then resampled them to a resolution of 5 m using the nearest neighbor algorithm to ensure that the original pixel values are preserved, and no interpolated values are created. We also projected the vector-based datasets to EPSG:3035 and rasterized them at 5 m resolution. Next, we snapped all the layers to the same reference raster layer to ensure cell alignment. Resolution: We did not perform resampling to improve the resolution of the input data, but to allow an increase in the thematic detail so that landscape features smaller than 100 m 2 and 10 m width (e.g., buildings, roads, hedgerows, small streams) can be represented on the final map. Therefore, only in and near buildings, roads and linear elements, a map resolution of 5 m can be expected (which corresponds to approximately 15-20% of the map area).
Data structuring and classification. We used the ESRI Land Cover Map 2020 30 as a base layer to build our LULC map, as it is the only selected land cover dataset with complete geographical coverage for the whole research area. We added land use information to this dataset using the data presented in Table 1. To combine the layers, we first assigned specific codes to each LULC class value in all datasets ( Table 2). Reoccurring LULC types across different datasets were assigned the same code (since MMU is very small and mostly pixel-based no further harmonization steps of land use types were necessary). We then overlayed the data by applying a specific layer hierarchy (Table 3) following a decision tree based on data accuracy (i.e., level of thematic and spatial detail). By assigning the value of the highest-ranking layer, we could decide which information to show on the final map, to control the uncertainties built in specific layers (e.g., presence of green linear elements in cultivated areas and grassland) and to include small LULC features (e.g. roads, single buildings, small streams in forests or grassland). All the work was done using ArcGIS Desktop 10.8.

Data Records
We present an easily accessible and freely available high resolution LULC map of the EUSALP region that can be used to support researchers and practitioners in the field of landscape planning and management. The data is freely available through the Figshare data publisher 31 . www.nature.com/scientificdata www.nature.com/scientificdata/ It includes two raster geospatial files that contain the EUSALP high resolution LULC map and a reference to the source dataset used to define each of the pixel values. The file has a pre-built color palette included to classify the 65 classes of the LULC map. The files included are:

technical Validation
The primary purpose of the present validation procedure is not to assess the individual LULC classes, but to ensure that the harmonization steps and hierarchy in combining the data are still capable of producing accurate LULC information, given that the map is built upon already validated and published input data. For more details on the validation and accuracy of the input data, see Table S2.  www.nature.com/scientificdata www.nature.com/scientificdata/ The assessment of thematic accuracy was carried out following the procedure applied for validation of similar LULC products 32,33 .
We applied a stratified random sampling design using the Eurostat LUCAS 2018 survey data points as the reference dataset 34 . In total, 32,227 LUCAS 2018 survey points are located within the EUSALP map extent. From these, a random selection of survey sites was made using the subset feature analysis tool in ArcGIS. The number of sites to be allocated to each LULC class was calculated as a function of their area proportion in the EUSALP map. In this way, the sampling design is not only systematic but also stratified. A minimum number of 20 sample units per LULC class was defined to ensure that even small strata were represented in the sample. However, for some strata there were no reference points available (41200, 42200). In the end, 2300 LUCAS 2018 points were randomly selected (see Figure S2).
An initial blind interpretation was performed, which consists in constructing the validation data without any knowledge of the map layer being evaluated. This was done by evaluating LULC on the reference points using EUSALPs' LULC map classification codes. ESRI World Imagery (https://services.arcgisonline.com/ArcGIS/rest/ services/World_Imagery/MapServer) and LUCAS 2018 thematic information were used for this first round of classification. As this method may underestimate the accuracy for complex and heterogeneous land use classes and potential land use changes (especially on arable land) or class definitions, we then used a plausibility approach, which is applied on all sample units that result in disagreement with the EUSALP LULC Map. This step consists in checking both classified values (blind validation and EUSALP map) for plausibility within the accepted product specifications, without knowing the corresponding classification source.
The overall map accuracy was assessed using an error matrix approach 35 . The producer accuracy (PA) and the user accuracy (UA) for each LULC class were evaluated in an area-weighted confusion matrix with 95% confidence interval. We obtained an overall accuracy (OA) of 88.8% ± 1.8 for the plausibility approach (Tables 4, 6, S3), which is a good result that meets validation standards, even though the blind evaluation showed substantially lower overall accuracy (64.8% ± 3.7) (Tables 5, S4).  www.nature.com/scientificdata www.nature.com/scientificdata/ For classes 41200, 42200, 52100 and 32200 there were too few sample points available. Therefore, these classes could not be properly validated 35 . However, this is of little concern as these LULC classes cover only 0.06% of the total map area. Only 17 reference points could not be classified. Table 4. Plausibility evaluation: Estimated error matrix based on Table S3 with cell entries expressed as the estimated proportion of area (%). Accuracy measures are presented with a 95% confidence interval. www.nature.com/scientificdata www.nature.com/scientificdata/ The OA of the EUSALP LULC map is very similar to the OA of the various input datasets and it would be very unlikely that the output is better than the input. Therefore, we are confident that the map creation approach was successful and that the created dataset meets accuracy standards.
Insight into the temporal extent of the LULC data is given by using the EUSALP_LULC_data_sources.tif raster 31 , which shows the reference year of each map cell. Information on the reference year exists for each input data layers except for Open Street Map.
Logical and format consistency of our map is ensured by the harmonization steps each data input file has undergone (the MMU is pixel based, the Coordinate Reference System is set to EPSG 3035, Pixel size is set to 5 m). Overlap cannot occur due to the final data format.
Positional accuracy could not be assessed due to missing reference data with sufficient spatial accuracy. However, all of the input data used have been evaluated for positional accuracy during the validation process.

Usage Notes
The EUSALP LULC map has a high potential for customization as the regrouping of the 65 LULC classes allows for interest-specific reclassifications in any GIS program. Due to the high level of detail, our map can be used even at the local scale, having a level of detail near artificial structures and settlements comparable to maps at 1:5,000 scale.  Table 6. Pixel count, total area, standard error of the adjusted area-estimate and 95% confidence interval for each acreage estimate of the EUSALP LULC Map classes. For easier interpretation, the road and railways, agricultural, green linear elements and river LULC classes were each aggregated into a single class.